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Abstract 

The "Pulse Synchronization" problem can be loosely described as targeting to invoke a recurring 
distributed event as simultaneously as possible at the different nodes and with a frequency that is as 
regular as possible. This target becomes surprisingly subtle and difficult to achieve when facing both 
transient and permanent failures. In this paper we present an algorithm for pulse synchronization that 
self-stabilizes while at the same time tolerating a permanent presence of Byzantine faults. The Byzantine 
nodes might incessantly try to de-synchronize the correct nodes. Transient failures might throw the 
system into an arbitrary state in which correct nodes have no common notion what-so-ever, such as time 
or round numbers, and can thus not infer anything from their own local states upon the state of other 
correct nodes. The presented algorithm grants nodes the ability to infer that eventually all correct nodes 
will invoke their pulses within a very short time interval of each other and will do so regularly. 

Pulse synchronization has previously been shown to be a powerful tool for designing general self- 
stabilizing Byzantine algorithms and is hitherto the only method that provides for the general design 
of efficient practical protocols in the confluence of these two fault models. The difficulty, in general, to 
design any algorithm in this fault model may be indicated by the remarkably few algorithms resilient 
to both fault models. The few published self-stabilizing Byzantine algorithms are typically complicated 
and sometimes converge from an arbitrary initial state only after exponential or super exponential time. 

The presented pulse synchronization algorithm will converge by only assuming that eventually the 
communication network delivers messages within bounded, say d, time units, and the number of Byzantine 
nodes, /, obeys the n > 3/ inequality, for a network of n nodes. The attained pulse synchronization 
tightness is 3d with a convergence time of a constant number of pulse cycles (each containing O(f) 
communication rounds). 
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1 Introduction 



The difficulty of fault tolerant synchronization: Coordination and synchronization are among the 
most fundamental elements of a distributed task. Nodes typically infer about the state of the other correct 
nodes from their own internal states. In the classic distributed paradigms some extent of initial synchrony 
or consistency is always assumed [8j. Even in the classic asynchronous network model, although nothing is 
assumed on the time taken for message delivery, it is typically assumed that nodes have a controlled and 
common initialization phase Thus it is assumed that the global state is at least partially consistent 

so that correct nodes have a common notion as to when the system last initialized. This greatly facilitates 
the progression of the algorithm in "asynchronous rounds" in which a node knows that if it has commenced 
some specific round r then all other correct nodes have progressed to at least some lesser round. This leads 
to a "state-machine replication" approach as a general framework to address the consistency in a distributed 
environment (see Q3]). Typically, the asynchronous model does not allow for deterministic fault tolerance 
as it might not be possible to distinguish between a late message and a faulty sender (or a lost message). 
In the synchronous network model, nodes may assume bounded time for message delivery (when the system 
is stable) in addition to assuming that nodes have a common initialization phase. These two assumptions 
allow nodes to use timing criteria to deduce whether certain actions should have already taken place. This 
allows for resilience to permanent faults and plays a pivotal role in the ability to tolerate Byzantine nodes. 
Synchronization enables correct nodes to determine whether a certain message received at a certain time or 
with a certain value at this certain time does not agree with the node's perception of the global progress of 
the algorithm. In order for all correct nodes to view symmetrically whether a node does not behave according 
to the protocol, it is required to assume that nodes have similar perceptions of the progress of the algorithm. 

A self-stabilizing algorithm does not assume a common initialization phase. This is required due to 
transient failures that might corrupt the local state of nodes, such as the notion as to how long ago the 
system or algorithm was initialized. The combination of self-stabilization and Byzantine fault tolerance 
poses a special challenge. The difficulty stems from the apparent cyclic paradox of the role of synchronization 
for containing the faulty nodes combined with the fact that a self-stabilizing algorithm cannot assume any 
sort of synchronization or inference of the global state from the local state. Observe that assuming a fully 
synchronous model in which nodes progress in perfect lock-step does not ease this problem (cf. [J]). 

The problem in general is to return to a consistent global state from a corrupted global state. The problem 
as stated through pulse synchronization, is to attain a consistent global state with respect to the pulse event 
only. I.e. that a correct node can infer that other correct nodes will have invoked their pulse within a 
very small time window of its own pulse invocation. Interestingly enough, this type of synchronization 
is sufficient for eventually attaining a consistent general global state from any corrupted general global 
state [2| . Self-stabilizing Byzantine pulse synchronization is a surprisingly subtle and difficult problem. To 
elucidate the difficulties in trying to solve this problem it may be instructive to outline a flaw in an earlier 
attempt to solve this problem Non-stabilizing Byzantine algorithms assume that all correct nodes 

have symmetric views on the other correct nodes. E.g. if a node received a message from a correct node 
then its assumed all correct nodes did so to. Following transient failures though, a node might initialize 
in a spurious state reflecting some spurious messages from correct nodes. With the pulse synchronization 
problem, this spurious state may be enough to trigger a pulse at the node. In order to synchronize their 
pulses nodes need to broadcast that they have invoked a pulse or that they are about to do so. Correct 
nodes need to observe such messages until a certain threshold for invoking a pulse is reached. When nodes 
invoke their pulses this threshold will be reached again subsequent to invoking the pulse, causing a correct 
node to immediately invoke a pulse again and again. 

To prevent incessant pulse invocations, a straightforward solution is to have a large enough period sub- 
sequent to the pulse invocation in which a node does not consider received messages towards the threshold. 
This is exactly where the complimentary pitfall lies, since some correct nodes may initialize in a state that 
causes them to invoke a pulse based on spurious messages from correct nodes. The consequent pulse message 
might then arrive at other correct nodes that initialize in a period in which they do not consider received 
messages. Byzantine nodes can, by sending carefully timed messages, cause correct nodes to invoke their 
pulses in perfect anti-synchrony forever. It is no trivial task to circumvent these difficulties. 
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It is interesting to observe that Byzantine (non-stabilizing) pulse synchronization can be trivially derived 
from Byzantine clock synchronization. Self-stabilizing (non-Byzantine) pulse synchronization can be easily 
achieved by following any node that invokes a pulse. Self-stabilizing Byzantine pulse synchronization on the 
other hand is apparently an extremely tricky task. 

Pulse Synchronization using Byzantine Agreement: In our model we do not assume any existing 
synchrony besides bounded message delivery. In j2] it is proven to be impossible to combine self-stabilization 
with even crash faults without the assumption of bounded message delivery. Thus our protocol only assumes 
the minimal synchrony required for overcoming crash faults. 

The tightly synchronized pulses are produced by utilizing a self-stabilizing Byzantine agreement protocol, 
which we have developed in 0, that does not assume any prior synchronization among the correct nodes. 
Intuitively, synchronizing pulses on top of a classic (non-stabilizing) Byzantine agreement should supposedly 
be rather straightforward: Execute distributed Byzantine agreement on the elapsed time remaining until the 
next pulse invocation. This scheme requires the correct nodes to terminate agreement within a short time 
of each other, but the major issue is that, unfortunately, when facing transient failures, the system may end 
up in a state in which any common reference to time or even common anchor in time might be lost. This 
preempts the use of classic (non-stabilizing) Byzantine agreement and or reliable broadcast, as these tools 
typically assume initialization with a common reference to time or common reference to a round number. 
Thus, a common anchor in time is required to execute agreement which aims at attaining and maintaining 
a common anchor in time. Thus, what is required, is an agreement algorithm that is both self-stabilizing 
and Byzantine. We resolve this apparent cyclic paradox by developing in (§] a self-stabilizing Byzantine 
agreement algorithm, named SS-Byz- Agree , with a unique technique that is based only on the bound on 
message transmission time among correct nodes to "anchor" a relative time reference to each invocation of 
the agreement algorithm. That algorithm is, to the best of our knowledge, the first Byzantine agreement 
algorithm that is also self-stabilizing. 

The system may be in an arbitrary state in which the communication network may behave arbitrarily 
and in which there may be an arbitrary number (up-to n) of concurrent Byzantine faulty nodes. The pulse 
synchronization algorithm will converge once the communication network eventually resumes delivering mes- 
sages within bounded, say d, time units, and the number of Byzantine nodes, /, obeys the n > 3/ inequality, 
for a network of n nodes. The attained pulse synchronization tightness is 3d. We denote Cycle the targeted 
time- interval between pulse invocations. The bound on the effective length of the cycle attained is within 
0(d) of the targeted length of Cycle. The convergence time is 6 cycles (each containing 0(f) communication 
rounds, where / is the bound on the number of concurrent permanent faults). 

Related work: Pulse synchronization can be trivially derived from clock synchronization, but no practical 
self-stabilizing Byzantine clock synchronization algorithm that does not assume the existence of synchronized 
pulses exists. In the first clock synchronization algorithms that are self stabilizing and tolerate Byzantine 
faults are presented. One of the algorithms assumes a common global pulse and converges in expected 
exponential time, the other that doesn't assume a pulse, converges in expected super-exponential time. In 
[H] we developed an efficient and practical self-stabilizing Byzantine clock synchronization algorithm based 
on pulse synchronization, though the particular pulse synchronization procedure presented in suffered 
from a flaw 1 . The flaw was in neglecting to consider all possible initial values when the nodes recovers 
after a transient faults. The current paper serves as a replacement for that pulse procedure. The clock 
synchronization algorithm in remains largely unaffected with only a minor change of the clock precision 
from 3d to 4d In |] a novel biologically inspired pulse synchronization procedure was developed. It has 
a fundamentally different structure than the current solution. The current solution converges in 6 cycles 
whereas that solution converges in 0(f) cycles and has a higher message complexity. Thus, the current 
solution scales better with respect to the network size n. In it is shown how to initialize Byzantine clock 
synchronization among correct nodes that boot at different times. Thus eventually they can also produce 
synchronized Byzantine pulses (by using the synchronized clocks). That solution is not self-stabilizing as 

lr The flaw was pointed out by Mahyar Malekpour from NASA LaRC and Radu Siminiceanu from NIA, see 1111 . 
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nodes are booted and thus do not initialize with arbitrary values in the memory. It has, on the other hand, 
a constant convergence time with respect to the required rounds of communication, whereas our current 
solution has a dependency on /, which is due to the self-stabilization requirement. In [2j it has been shown 
how, by assuming synchronized pulses, almost any Byzantine algorithm can be converted to its self-stabilizing 
Byzantine counterpart in an efficient and practical manner. To the best of our knowledge there is sofar no 
alternative method besides pulse synchronization for this. That paper includes a short review on the few 
other existing self-stabilizing Byzantine algorithms. 

2 Model and Problem Definition 

The system is a network of n nodes that communicate by exchanging messages. The nodes regularly invoke 
"pulses", ideally every Cycle real-time units. The invocation of the pulse is preceded by the sending of a 
message to all the nodes stating the intention of invoking a pulse. We assume that the message passing 
allows for an authenticated identity of the senders. The communication network does not guarantee any 
order on messages among different nodes. Individual nodes have no access to a central clock and there is no 
external pulse system. The hardware clock rate (referred to as the physical timers) of correct nodes has a 
bounded drift, p, from real-time rate. Consequent to transient failures there can be an arbitrary number of 
concurrent Byzantine faulty nodes, the turnover rate between faulty and non-faulty behavior of the nodes 
can be arbitrary and the communication network may behave arbitrarily. Eventually the system behaves 
coherently again but in an arbitrary state. 

Definition 2.1 A node is non-faulty at times that it complies with the following: 

1. (Bounded Drift) Obeys a global constant < p « 1 (typically p w 1CP 6 ,), such that for every real-time 
interval [u, v] : 

(1 — p)(v — u) < 'physical timer'(v) — 'physical timer'(u) < (1 + p)(v — u). 

2. (Obedience) Operates according to the instructed protocol. 

3. (Bounded Processing Time) Processes any message of the instructed protocol within ir real-time units 
of arrival time. 

A node is considered faulty if it violates any of the above conditions. We allow for Byzantine behavior of 
the faulty nodes. A faulty node may recover from its faulty behavior once it resumes obeying the conditions 
of a non-faulty node. In order to keep the definitions consistent, the "correction" is not immediate but 
rather takes a certain amount of time during which the non-faulty node is still not counted as a correct 
node, although it supposedly behaves "correctly". 2 We later specify the time-length of continuous non-faulty 
behavior required of a recovering node to be considered correct. 

Definition 2.2 The communication network is non-faulty at periods that it complies with the following: 

1. Any message arrives at its destination node within 5 real-time units; 

2. The sender's identity and content of any message being received is not tampered. 

Thus, our communication network model is a "bounded-delay" communication network. We do not as- 
sume the existence of a broadcast medium. We assume that the network cannot store old messages for 
arbitrary long time or lose any more messages, once it becomes non-faulty. 3 



2 For example, a node may recover with arbitrary variables, which may violate the validity condition if considered correct 
immediately. 

3 It is enough to assume that messages among non-faulty nodes are delivered within the specified time bounds. 
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Basic definitions and notations: 

We use the following notations though nodes do not need to maintain all of them as variables. To 
distinguish between a real-time value and a node's local-time reading we use t for the former and r for the 
latter. 

• d = S + 7T. Thus, when the communication network is non-faulty, d is the upper bound on the elapsed 
real-time from the sending of a message by a non-faulty node until it is received and processed by every 
non-faulty node. 

• A pulse is an internal event targeted to happen in "tight" 4 synchrony at all correct nodes. A Cycle 
is the "ideal" time interval length between two successive pulses that a node invokes, as given by the 
user. The actual cycle length, denoted in regular caption, has upper and lower bounds as a result of 
faulty nodes and the physical clock skew. (Our protocol requires that Cycle > (10/ + 16) ■ d.) 

• cr represents the upper bound on the real-time window within which all correct nodes invoke a pulse 
(tightness of pulse synchronization) . We assume that Cycle ^> cr. (Our solution achieves a = 3d.) 

• <j>i(t) £ R + U{oo}, < i < n, denotes, at real-time t, the elapsed real-time since the last pulse invocation 
of ^. It is also denoted as the 11 4> of node p". We occasionally omit the reference to the time in case it 
is clear out of the context. For a node, pj, that has not sent a pulse since initialization of the system, 
(f)j = oo. 

• cycle m i n and cycle max are values that define the bounds on the actual cycle length during correct 
behavior. (We achieve cycle min — Cycle — lid < cycle < Cycle + 9d = cycle max .) 

• A BYZ represents the maximal real-time required to complete the specific self-stabilizing Byzantine 
agreement protocol used. (Using SS-Byz- Agree in [fjj it becomes 7(2/ + 3)d.) 

Note that the protocol parameters n, / and Cycle (as well as the system characteristics d and p) are 
fixed constants and thus considered part of the incorruptible correct code. 5 Thus we assume that non-faulty 
nodes do not hold arbitrary values of these constants. It is required that Cycle is chosen s.t. cycle m j n is 
large enough to allow our protocol to terminate in between pulses. 

A recovering node should be considered correct only once it has been continuously non-faulty for enough 
time to enable it to have deleted old or spurious messages and to have exchanged information with the other 
nodes through at least a cycle. 

Definition 2.3 The communication network is correct following A net real-time of continuous non-faulty 
behavior. 6 

Definition 2.4 A node is correct following A no d e real-time of continuous non-faulty behavior during a 
period that the communication network is correct. 7 

Definition 2.5 (System Coherence) The system is said to be coherent at times that it complies with the 
following: 

1. (Quorum) There are at least n — / correct nodes. 8 where f is the upper bound on the number of 
potentially non-correct nodes, at steady state. 

2. (Network Correctness) The communication network is correct. 
4 We consider c • d, for some small constant c, as tight. 

5 A system cannot self-stabilize if the entire code space can be perturbed, see [§]■ 

6 We will use A„ e t > d. 

7 We will use A node > Cycle + cycle max . 

8 The results can be replaced by 2/ + 1 or by T-^^l correct nodes. But for n > 3/ + 1 these changes will require some 
modifications to the structure of the protocol. 



5 



Hence, if the system is not coherent then there can be an arbitrary number of concurrent faulty nodes; 
the turnover rate between the faulty and non-faulty nodes can be arbitrarily large and the communication 
network may deliver messages with unbounded delays, if at all. The system is considered coherent, once the 
communication network and a sufficient fraction of the nodes have been non-faulty for a sufficiently long 
time period for the pre-conditions for convergence of the protocol to hold. The assumption in this paper, as 
underlies any other self-stabilizing algorithm, is that the system eventually becomes coherent. 



3 Self-stabilizing Byzantine Pulse-Synchronization 

We now seek to give an accurate and formal definition of the notion of pulse synchronization. The definitions 
start by defining a subset of the system states, called pulse_ states, that are determined only by the elapsed 
real-time since each individual node invoked a pulse (the <fi's). Nodes that have "tight" or "close" (j>'s will be 
called a synchronized set of nodes. To complete the definition of synchrony there is a need to address the 
recurring brief time periods in which a node in a synchronized set of nodes has just invoked a pulse while 
others are about to invoke one. This is addressed by considering nodes whose </>'s are almost a Cycle apart. 

If all correct nodes in the system comprise a synchronized set of nodes then we say that the pulse_state 
is a synchronized _ puis e_ state of the system. The goal of the algorithm is hence to reach a synchro- 
nized_pulse_state of the system and to stay in such a state. 

• The pulse state of the system at real-time t is given by: 

pulse _state(t) = (<po(t), . ■ . , (j> n -i(t)) . 

• Let G be the set of all possible pulse_states of a system S. 

• A set of nodes, N , is called synchronized at real-time t if 
Vpi,pj € N, (j>i{t),<j>j(t) < cycle max , and one of the following is true: 

1. \<j>i(t) - <f>j(t)\ <<r, or 

2. cycle min - a < \<j>i(t) - <j>j(t)\ < cycle max and \<f>i(t - a) - <j>j{t - a)\ < a. 

• s G G is a synchronized pulse state of the system at real-time t if the set of correct nodes is 
synchronized at real-time t. 

Definition 3.1 The Self-Stabilizing Pulse Synchronization Problem 

Convergence: Starting from an arbitrary system state, the system reaches a synchronized _pulse_state af- 
ter a finite time. 

Closure: If s is a synchronized _pulse_ state of the system at real-time to then V real-time t, t > to, 

1. puis e_ state (t) is a synchronized _pulse_state, 

2. In the real-time interval [to, t] every correct node will invoke at most a single pulse ift — to < cycle m ; n 
and will invoke at least a single pulse if t — to > cycle max . 

The second Closure condition intends to tightly bound the effective pulse invocation frequency within 
a priori bounds. This is in order to defy any trivial solution that could synchronize the nodes, but be 
completely unusable, such as instructing the nodes to invoke a pulse every a time units. Note that this is a 
stronger requirement than the "linear envelope progression rate" typically required by clock synchronization 
algorithms, in which it is only required that clock time progress as a linear function of real-time. 
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3.1 The Pulse Synchronization Algorithm 

The self-stabilizing Byzantine pulse synchronization algorithm presented is called Ab-Pulse- Synch (for 
Agreement-based Pulse Synchronization). A cycle is the time interval between two successive pulses that a 
node invokes. The input value Cycle is the ideal length of the cycle. The actual real-time length of a cycle 
may deviate from the value Cycle in consequence of the clock drifts, uncertain message delays and behavior 
of faulty nodes. In the proof of Lemma 13.121 the extent of this deviation is explicitly presented. 

The environment is one without any granted synchronization among the correct nodes besides a bound 
on the message delay. Thus, it is of no use whether a sending node attaches some time stamp or round 
number to its messages in order for the nodes to have a notion as to when those messages supposedly were 
sent. Hence in order for all correct nodes to symmetrically relate to any message disseminated by some 
node, a mechanism for agreeing on which phase of the algorithm or "time" that the message relates to must 
be implemented. This is fulfilled by using SS-Byz- Agree, a self-stabilizing Byzantine agreement protocol 
presented in jSj. The mode of operation of this protocol is as follows: A node that wishes to initiate agreement 
on a value does so by disseminating an initialization message to all nodes that will bring them to (explicitly) 
invoke the SS-Byz- Agree protocol. Nodes that did not invoke the protocol may join in and execute the 
protocol in case enough messages from other nodes are received during the protocol. The protocol requires 
correct initiating nodes not to disseminate initialization messages too often. In the context of the current 
paper, a "Support-Pulse" message serves as the initialization message. 

When the protocol terminates, the protocol SS-Byz- Agree returns at each node q a triplet (p,m,Tjj), 
where m is the agreed value that p has sent. The value is an estimate, on the receiving node g's local 
clock, as to when node p have sent its value m. We also denote it as the "recording time" of (p, m). Thus, a 
node g's decision value is (p, m, tP) if the nodes agreed on (p, m). If the sending node p is faulty then some 
correct nodes may agree on (p, _L), where _L denotes a non- value, and others may not invoke the protocol at 
all. The function rt{r q ) represents the real-time when the local clock of q reads r q . The Ab-Pulse-Synch 
algorithm uses the SS-Byz- Agree protocol for a single message only ("Support-Pulse" message) and not 
for every message communicated. Thus the agreement is on whether a certain node sent a "Support-Pulse" 
message and when, and not on any actual value sent. Correct nodes do not send this message more than 
once in a cycle. 

The ss-Byz- Agree protocol satisfies the following typical Byzantine agreement properties: 

Agreement: If the protocol returns a value at a correct nodes, it returns the same value at all correct 

nodes; 

Validity: If all correct nodes are triggered to invoke the protocol SS-Byz- Agree by a value sent by a correct 
node p, then all correct nodes return that value; 
Termination: The protocol terminates in a finite time; 

It also satisfies some specific timeliness properties that are listed in Section I3T2I 
The heuristics behind Ab-Pulse-Synch protocol are as following: 

• Once the node approaches its end of Cycle, as measured on its physical timer, it sends a "Propose-Pulse" 
message stating so to all nodes. 

• When (n—f) distinct "Propose-Pulse" messages are collected, the node sends a "Support-Pulse" message 
that states so to all nodes. This serves as the initialization message for invoking agreement. 

• Upon receiving such a message a receiving node invokes self-stabilizing Byzantine agreement ([El) on 
the fact that it received such a message from the specific node. We require that Cycle be long enough 
to allow the agreement instances to terminate. 

• If all correct nodes invoked agreement on the same message within a short time window then they 
will all agree that the sender indeed sent this "Support-Pulse" message and all will have proximate 
estimates as of when that node could have sent this message. 

• The time estimate is then used to reset the countdown timer for the next pulse invocation and a 
consequent "reset" messages to be sent. Each new agreement termination causes a renewed reset. 
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• Upon arrival of a reset message the sending node is taken off the list of nodes that have ended their 
Cycle (as indicated by the earlier arrival of a "Propose-Pulse" message for that node) . 

• Thus, some short time after all correct nodes have done at least one reset of their cycle countdown 
timer, no new agreement can be initiated by any node (faulty or correct). 

• Thus, there is one agreement termination that marks a small time-window within which all correct 
nodes do a last reset of the cycle countdown timer. Thus, essentially, all correct nodes have synchronized 
the invocation of their next pulse. 

The algorithm is executed in an "event-driven" manner. Thus, each node checks the conditions and 
executes the steps (blocks) upon an event of receiving a message or a timer event. To simplify the presentation 
it is assumed in the algorithm that when a correct node sends a message it receives its own message through 
the communication network, like any other correct node. 

The algorithm assumes a timer that measures interval of time of size Cycle. The algorithm uses several 
sets of messages or references that are reset throughout the algorithm, and every message that have arrived 
more than Cycle + 2d ago is erased. 

The algorithm assumes the ability of nodes to estimate some time intervals, like at Line C2. These 
estimates can be carried out also in a self-stabilizing environment, by tagging each event according to the 
reading of the local timer. So even if the initial values are arbitrary and cause the non-faulty node to behave 
inconsistently, by the time it is considered correct the values will end up resetting to the right values. Note 
that the nodes do not exchange clock values, rather they measure time locally on their own local timers. 
It is assumed that a non-faulty node handles the wrap around of its local timer while estimating the time 
intervals. 

Note that there is no real reason to keep a received message after it has been processed and its sender 
been referred to in the appropriate data structures. Hence, if messages are said to be deleted after a certain 
period, the meaning is to the reference of the message and not the message itself, which can be deleted 
subsequent to processing. 

For reasons of readability we have omitted the hardware clock skew p, from the constants, equations and 
proofs. The introduction of p does not change the protocol whatsoever nor any of the proof arguments. It 
only adds a small insignificant factor to many of the bounds. 

We now seek to explain in further detail the blocks of the algorithm: 

Block A: We assume that a background process continuously reduces the counter cycle _countdown, in- 
tended to make the node count Cycle time units on its physical timer. On reaching 0, the background process 
resets the value back to Cycle. It expresses its intention to synchronize its forthcoming pulse invocation with 
the pulses of the other nodes by sending an endogenous "Propose-Pulse" message to all nodes. Note that a 
reset is also done if cycle _countdown holds a value not between and Cycle. The value of cycle _countdown 
is also reset once the "pulse" is invoked. Observe that nodes typically send more than one message in a cycle, 
to prevent cases in which the system may be invoked in a deadlocked state. 

Block B: The "Propose-Pulse" messages are accumulated at each correct node in its proposers set. We 
say that two messages are distinct if they were sent by different nodes. 

Block C: These messages are accumulated until enough (at least n — f) have been collected. If in ad- 
dition the node has already proposed itself then the node will declare this event through the sending of a 
"Support-Pulse" message, unless it has already sent such a message not long ago. The message bears a 
reference to the nodes in the proposers set of the sender. Note that a node that was not able to send the 
message because sending one not long ago, may send it later when the conditions will hold. 

Block D: Any such "Support-Pulse" message received is then checked for credibility by verifying that the 
history it carries has enough (at least / + 1) backing-up in the receiver's proposers set and that a previous 
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Algorithm Ab-Pulse-Synch (n, f, Cycle) /* continuously executed at node q */ 



Al. if (cycle _countdown = 0) then /* assumes a background process 

that continuously reduces cycle_ countdown */ 

A2. cycle _countdown := Cycle; 

A3. send "Propose-Pulse" message to all; /* endogenous message */ 

Bl. if received "Propose-Pulse" message from a sender p and p £ recentreset q then 
B2. add p to proposers q ; 

CI. if q £ proposers q & \\proposers q \\ > n — / and 

C2. did not send a "Support-Pulse" in the last Cycle — 8d then 

C3. send "Support-Pulse(proposers 9 )" to all; /* support the forthcoming pulse */ 

Dl. if received "Support-Pulse(proposers p )" message from a sender p and in the last Cycle — lid 

D2. did not invoke ss-Byz- Agree (p,"support") or decide on (p, u support", _) and 

D3. within d of its reception \\(proposers q U recent_reset q ) n proposers p \\ > / + 1 then 

D4. ss-Byz- Agree (p, "support") /* invoke agreement on the pulse supporter */ ; 

El. if decided on (p, "support", Tq) at some local-time r q then /* on non _L uahie */ 

E2. if Tq > latest _ support then 

E3. latest _ support := r^; /* tte latest agreed supporter so far at q */ 

E4. if not invoked a pulse since local-time r q — (Abyz + 6d) then /* pulse separation */ 

E5. invoke the pulse event; 

E6. cycle _countdown := Cycle — (r q — Tq); /* reset cycle */ 

E7. send "Reset" message to all and remove yourself, q, from proposers ; 

Fl. if received "Reset" from a sender p then 

F2. move p from proposers q to recent _reset q ; /* recent_reset decay within 2d + e time */ 

Continuously ongoing cleanup: 
Gl. delete an older message if a subsequent one arrives from the same sender; 
G2. delete any data in recent_reset q after 2d + e time units; 
G3. reset cycle countdown to be CycJe if cycle countdown £ [0, Cycle]; 
G4. reset latest _ support to be r — Cycle if latest support [t — Cycle, t]; 
G5. delete any other message or data that is older than Cycle + 2d time units; 



Figure 1: The Ab-Pulse-Synch Pulse Synchronization Algorithm 

message was not sent recently. It is only then that agreement is initiated, on a credible pulse supporter. Note 
that a correct node would not have supported a pulse (sent a "Support-Pulse" message) unless it received 
n — / propose messages and has not sent one recently. Thus all correct nodes will receive at least / + 1 
propose messages from correct nodes and will join the agreement initiation by the pulse supporter within d 
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real-time units. 

Block E: The Byzantine agreement protocol decides whether a certain node issued a "Support-Pulse" 
message. Each node q decides at some local-time r q . The agreement protocol also returns an estimate as of 
when, on the deciding node's local clock, the message was sent by the initiating node. This time is denoted 
r p . Correct nodes end up having bounded differences in the real-time translation of their t p values, for a 
specific agreement. 

When a node decides on a value it checks whether the r p returned by the agreement protocol is the 
most recent decided on so far in the current cycle. Only then are lines E3-E7 executed. Note that the 
same agrement instance may return a r p , which is the most recent one for a certain correct node but may 
not be the most recent at another correct node. This can happen because correct nodes terminate the 
SS-Byz- Agree protocol within 3d time units of each other, 9 and their translation of the realtime of the 
t p values may differ by 5d. Thus, this introduces a 3d time units uncertainty between the execution of the 
subsequent lines at correct nodes. 

In Line E4-E5 a pulse is invoked if no pulse has recently been invoked. In Line E6 the node now resets 
the cycle so that the next pulse invocation is targeted to happen at about one Cycle later. In Line E7 a 
"Reset" message is sent to all nodes to inform that a reset of the cycle has been done. The function of this 
message is to make every node that resets, be taken out of the proposers set of all other correct nodes 10 . To 
ensure that only one pulse is invoked in the minimal time span of a cycle a pulse will not be invoked in Line 
E4 if done so recently. 

Block F: This causes all correct nodes to eventually remove all other correct nodes from their proposers. 
Thus, about 2d after all correct nodes have executed Line E7 at least once, no instance of SS-Byz- Agree will 
be initiated by any correct node and consequently no more agreements can terminate (beyond the currently 
running ones). The last agreement decision of the correct nodes, done within a short time- window of each 
other, returns different but closely bounded t p values at the correct nodes. Consequently they all reset their 
cycle _countdown counters to proximate values. This yields a quiescent window between the termination of 
the last agreement and the next pulse invocation, which will be invoked within a small time window of each 
other. 

Block G: The scheme outlined above is not sufficient to overcome the cases in which some nodes initialize 
with reference to spurious messages sent by other nodes while such messages were not actually sent. The 
difficulty lies in the fact that Byzantine nodes may now intervene and constantly keep the correct nodes 
with asymmetric views on the sets of messages received. To overcome this, Ab-Pulse-Synch has a decay 
process in which each data that is older than some period is deleted. 

Note that the decaying of values is carefully done so that correct nodes never need to consider messages 
that arrived more than Cycle + 2d ago. 

3.2 Proof of Correctness 

The proof of correctness requires very careful argumentation and is not a straightforward standard proof of 
the basic properties. The critical parts in the proof is showing that despite the complete chaotic initialization 
of the system the correct nodes are able to produce some relation among their local clocks and force the faulty 
nodes to leave a short interval of time into which no recording time refers to, followed by an interval during 
which no correct node updates its latest_ support. After such intervals we can argue about the convergence 
of the states of the correct nodes, proving that stability is secured. The nontraditional values of the various 
constants bounding Cycle has to do with the balance between ensuring the ability to converge and limiting 
the ability of the Byzantine nodes to disturb the convergence by introducing critically timed pulse events 
that may disunite the correct nodes. 

9 It is part of the timeliness properties of the ss-Byz- Agree protocol, see Section I3T2I 

10 Note that a node may send multiple "Reset" messages. It is done in order to simplify some of the claims in the proof. 
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The proof shows that when the constants are chosen right, no matter what the faulty nodes will do and 
no matter what the initial values are, there will always be two intervals of inactivity, concurrently at all 
correct nodes, after which the correct nodes restore consistency of their pulses. 

The proof uses the following specific properties of the SS-Byz- Agree protocol (|6j): 

Timeliness- Agreement Properties: 

1. (agreement) For every two correct nodes q and q' that decides (p, m, r^) and {p,m,r^,} at local times 
T q and T q i, respectively: 

(a) \rt(r q ) — rt(r q ')\ < 3d, and if validity holds, then \rt(r q ) — rt(r q >)\ < 2d. 



actually invoked ss-Byz- Agree (p, m) did so. 
(d) rt(r?) < rt(r q ) and rt{T q ) — rt{rP) < A BYZ for every correct node q. 

2. (validity) If all correct nodes invoked the protocol in an interval [to, to + d], as a result of some initial- 
ization message containing m sent by a correct node p that spaced the sending by at least 6d from the 
completion of the last agreement on its message, then for every correct node q, the decision time T q , 
satisfies t — d < rt(r^) < rt(r q ) < t + 3d. 

3. (separation) Let q be any correct node that decided on any two agreements regarding p at local times 
T q and f q , then t 2 + 5d < t\ and rt(r q ) + 5d < t\ < rt(f q ), where t 2 is the latest real-time at which 
a correct node invoked SS-Byz- Agree in the earlier agreement and t\ is the earliest real-time that 
SS-Byz- Agree was invoked by a correct node in the later agreement. 

The Ab-Pulse-Synch requires the following bounds on the variables: 

• Cycle > max[(lG/ + 16)d, A BYZ + Ud]. 

• A no de > Cycle + cycle 

max- 



The requirements above, and the definitions of correctness imply that from an arbitrary state the system 
becomes coherent within 2 cycles. 

Note that in all the theorems and lemmata in this paper, if not stated differently, it is assumed that the 
system is coherent, and the claims hold as long as the system stays coherent. 

In the proof, whenever we refer to correct nodes that decide we consider only decisions on ^J. values. 
When the agreement returns _L it is not considered a decision, and in such a case the agreement at other 
correct nodes may not return anything or may end up in decaying all related messages. 

Theorem 1 f Convergence,) From an arbitrary (but coherent) state a synchronized _pulse_state is reached 
within 4 cycles, with a = 3d. 

Proof: A node that recovers may find itself with arbitrary input variables and in an arbitrary step in 
the protocol. Within a cycle a recovered node will decay all spurious "messages" that may exist in its data 
structures. Some of these might have been resulted from incorrect initial variables, such as when invoking 
the ss-Byz- Agree protocol without the specified pre-conditions. Such effects also die out within a cycle. 

The above argument implies that by the time the node is considered correct, all messages sent by non- 
faulty nodes that are reflected in its data structures were actually sent by them (at the arbitrary state at 
which they are). Thus, by the time that the system becomes coherent the set of correct nodes share the 
values they hold in the following sense: if a message sent by a non-faulty node is received by a correct node, 
then within d it will be received by all other correct nodes; and all future messages sent by correct nodes are 
based on actual messages that were received. 



GO 
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Once the system is coherent, then there are at least n — / correct nodes that follow the protocol, and all 
messages sent among them are delivered by the communication network and processed by the correct nodes 
within d real-time units. 

Lemma 3.1 Within d real-time units of the sending of a "Propose-Pulse" message by a correct node p, it 
appears in proposers q of any correct node q. Furthermore, it appears in proposers q only if p sent a "Propose- 
Pulse" message within the last d units of time. 

Proof: From the coherence of the system, p's message arrives to all within d real-time. By the Timeliness- 
Agreement Property (Id) and the bounds on Cycle, a node that have recently sent a "Reset" message resets 
its cycle_ countdown to a value that is at least Cycle — A BYZ > 14d. Thus, the minimum real-time between 
the receipt of its past "Reset" and its current "Propose-Pulse" at any correct node is more than 2d apart, and 
therefore by the time its "Propose-Pulse" message arrives it will not appear in recent_ reset q at any correct 
node q. The second part is true because p can be in proposers q without prior sending of a "Propose-Pulse" 
message only if node q recovered in that state. But by the time node q is considered correct any reference 
to such a message has already been decayed. □ 

Lemma 3.2 In every real-time interval equal to Cycle, every correct node sends either a "Propose-Pulse" 
message or a "Reset" message. 

Proof: Recall that every correct node's cycle _countdown timer is continuously running in the background 
and would be reset to hold a value within Cycle if it initially held an out-of-bound value. Thus, if the 
cycle _countdown is not reset to a new value when a "Reset" is invoked, then within Cycle real-time units 
the cycle _countdown timer will eventually reach and a "Propose-Pulse" message will consequently be sent. 
Whenever a cycle _countdown is reset, its value is always at most Cycle. □ 

Lemma 3.3 Within d real-time units of sending a "Reset" message by a correct node p, that node does not 
appear in proposers q of any correct node q. Furthermore, a correct node p is deleted from proposers q only if 
it sent a "Reset" message. 

Proof: The first part follows immediately from executing the protocol in a coherent state. The only 
sensitive point arises when a "Propose-Pulse" message that was sent by p prior to the "Reset" message 
arrives after the "Reset" message. This can happen only if the "Propose-Pulse" message was sent within d of 
the "Reset" message. But in this case the protocol instructs node q not to add p to proposers q . For proving 
the second part we need to show that a correct node is not removed from proposers q because q decayed it. 
By Lemma DO it appears in proposers q only because of sending a "Propose-Pulse" message. By Lemma HO 
it will resend a new message before q decays the previous message, because messages are decayed (Block G) 
only after Cycle + d. □ 

Lemma 3.4 Every correct node invokes SS-Byz- Agree (p, "support") within d real-time units of the time 
a correct node p sends a "Support-Pulse" message. 

Proof: If a correct node p sent a "Support-Pulse" message in Line C3, then the preconditions of Line D2 hold 
because the last reception of "Support-Pulse" and the last invocation of SS-Byz- Agree (p, "support") that 
followed took place at least Cycle — 8d — d ago, proving the first condition. By the Timeliness- Agreement 
property (2) the last decision took place at least Cycle — 8d — 3d ago, proving the other condition. The 
condition in Line D3 clearly holds for all correct nodes. This is because within d real-time units every 
correct node in proposers p will appear in proposers q and every correct node that was deleted from proposers q 
and is not in recent_ reset q should have been already deleted from proposers p . To prove this last claim, assume 
that node q received "Reset" from a correct node v at real-time t. By t + d this message should arrive at p, 
and therefore any "Propose-Pulse" message from p that contains v should be sent before that and should be 
received before t + 2d, thus before removing v from recent _ res et q . □ 
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Lemma 3.5 // a correct node p sends "Support-Pulse" at real-time to then every correct node q decides 
(p, "support",^) at some local-time r q , such that to — d < rt(r?) < rt(r q ) < to + 3d and to < rt(r q ). 

Proof: By Lemma f3. 41 all correct nodes invoke ss-Byz- Agree (p, "support") in the interval [to, to + d}. 
Thus the precondition conditions for the Timeliness-Agreement property (2) hold. Therefore, each correct 
node q decides on (p, _,t?) at some real-time rt(r q ) that satisfies to — d < rt(r?) < rt(r q ) < to + 3d. □ 

Lemma 3.6 Let [t,t+ Cycle] be an interval such that for no correct node rt (latest _ support) G [t,t+ Cycle], 
then by t + Cycle + 4d all correct nodes decide. 

Proof: Assume that all decisions by correct nodes resulted in rt(latest_ support) < t. Thus, since there 
are no updates to cycle_ countdown the cycle_ countdown at all correct nodes should expire by t + Cycle. 
By Lemma 13.51 if any correct node would have sent "Support-Pulse" in that interval, then we are done. 
Otherwise, by that time all should have sent a "Propose-Pulse" message. Since no node removes old messages 
for Cycle + 2d, and more than Cycle — 8d real-time passed, by t + Cycle + d at least one correct node will 
send a "Support-Pulse" message. By Lemma 13. 41 all will invoke SS-Byz- Agree within another d real-time 
units. The Timeliness- Agreement property (2) implies that by t + Cycle + Ad all will decide. □ 

Note that if a faulty node sends "Support-Pulse", some correct node may join and some may not, and the 
actual agreement on a value and the time of such an agreement depends on the behavior of the faulty 
nodes. We address that later on in the proof. We first prove a technical lemma. 

Lemma 3.7 Let t' , be a time by which all correct nodes decided on some values since the system became 
coherent. Let B' and B satisfy B' < B, and 3d < B. If no correct node decides on a value that causes 
updating latest _ support to a value in an interval [t',t' + B], and no correct node updates its latest _ support 
or resets its cycle _ countdown during the real-time interval [t' + B' ,t' + B], then for any pair of correct nodes 
\cycle_countdown q (t") — cycle _countdown q ,(t")\ < 5d for any t" , t' + B' < t" <t' + B. 

Proof: By assumption, the agreements prior to t' satisfy the Timeliness- Agreement properties. Past t' + B' 
and until t' + B no node updates its latest _ support. Thus, for all nodes the value of rt(latest_ support) 
is bounded by rt (latest _ support) < t' . Let q be the correct node with the maximal rt(latest_support q ) 
that was set following a decision (pi, _,rf 1 } at timer t\, where latest _support q = rf 1 . By the Timeliness- 
Agreement property (la), any correct node v will execute Line E2 following a decision on (pi, _, /i^ 1 } at 
some timer jUi, such that \rt(T\) — rt{n\)\ < 3d. By property (lb), r^rf 1 ) — rt^ 1 ) < 5d. Assume first that 
latest _support v = /if 1 . 
At local-time t\, at q: 

cycle _countdown q (ri) = Cycle 

At real-time t", t" > r^rf 1 ), at q: 

cycle _countdown q (t") = Cycle — (rt(ri) 

Similarly at real-time t", t" > r^/if 1 ), at v: 

cycle _countdown v (t") = Cycle — (t — rt^^ 1 )). 

Thus, 

\cycle_ countdown q (t") — cycle _countdown v (t")\ < 5d. 

Otherwise, v assigned latest _support v as a result of deciding on some (p2,_,M2 2 ) at some timer fj,2, 
rt(fj,2) < t', where latest _support v — ^ . Let r 2 be the timer at q when it decided (p2,_,t|' 2 ). By the 
Validity and the Timeliness- Agreement properties, |rt(r 2 ) — rt(fj, 2 )\ < 3d and ^(t^ 2 ) — rt(^2 2 )\ < 5d. 

By assumption, 

rt(rf) > rt(ii p 2 2 ) > rt«) > rt^l 1 ) - 5d. 



- ( n - rf 1 ) = Cycle- (rt(n) - rt(rf)). 

- ri(rf )) - (t" - rt(n)) = Cycle - (t" - rttf 1 )). 
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At local-time t\, at q: 



cycle_countdown q (T 1 ) — Cycle — (ti — rf 1 ) — Cycle — (rt(ri) — rtfrf 1 )). 



At local-time ^2, at v: 



cycle _countdown v ( y [i2) = Cycle — (fi2 — l$iY 



Let t" = rf(if ) = rt (//'/), then 



cycle 



countdown 9 (t") = Cycle — if 



,11 



rt(7f )), 



and 



cycle 



countdown v (^2) — Cycle — (i 
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rt(M?)). 



Therefore, we conclude 



\cycle_countdown q (t") — cycle_countdown v (t")\ < bd. 



□ 



Lemma 3.8 //a correct node p sends a "Support-Pulse" at some real-time to then: 

1. No correct node will invoke ss-Byz- Agree during the period [to + 6d, to + Cycle — d]; 

2. No correct node sends a "Support-Pulse" or "Propose-Pulse" during that period; 

3. The cycle_ countdown counters of all correct nodes expire within 5d of each other at some real-time in 
the interval [to + Cycle — d, to + Cycle + 6d] . 

Proof: By Lemma f3 , 51 each correct node decides on p's "Support-Pulse". Each correct node that did not 
update its latest _ support recently, will send a "Reset" message as a result of this decision. Since several 
agreements from different nodes may be executed concurrently, we need to consider their implication on the 
resulting behavior of the correct nodes. 

Consider first the case that a correct node reached a decision and sent "Reset" before deciding on p's 
"Support-Pulse". If the decision took place earlier than to — d then, by the Timeliness- Agreement property (2), 
it will update it's latest _ support after the decision on p's "Support-Pulse". 

By the same Timeliness- Agreement properties, every correct node that has not sent "Reset" already, will 
end up updating its latest_ support and sending "Reset" at some time during the interval [to — d,to + 3d]. By 
to + 4g2 no correct node will appear in proposers of any correct node and until it will send again a "Propose- 
Pulse" message, since its "Reset" message will arrive to all non-faulty nodes. Thus, from time to + 4c? and 
until some correct node will send a new "Propose-Pulse" message, no correct node will send "Support-Pulse" 
message. Moreover, past to + 6c? no correct node will invoke a SS-Byz- Agree in Line D3, because all correct 
nodes will not appear also in recent_ reset. Observe that if there is a "Propose-Pulse" message in transit from 
some correct node v, or if a correct node v happened to send one just before sending the "Reset" message, 
that "Propose-Pulse" will arrive within d of receiving the "Reset" message, and therefore by the time that 
node will be removed from recent_ reset all such messages will arrive and therefore node v will not be added 
to proposers as a result of that message later than to + 6d. 

Even though different correct nodes may compute their latest _support q as a result of different agreements, 
by the Timeliness- Agreement properties (Id) and (2), at time to + 60? the value of latest _support q satisfies 
rt(latest_support q ) G [to — d,to + 6c?], for every correct node q. 

Past time to + Gd and until to + 6c? + A BYZ correct nodes may still decide on values from other agreements 
that were invoked in the past by faulty nodes. By the Timeliness-Agreement property (lc) no such value 
result in a latest_ support later than t + 6d, since no correct node will invoke SS-Byz- Agree until some 
correct node will send a future "Propose-Pulse" message. 

Let tq be the latest real-time a correct node q updated the calculation of cycle_ countdown because 
of a latest _ support value in the interval [to — d, to + 6d] . It will send its next "Propose-Pulse" message at 
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t q + cycle_ countdown = t q + Cycle— (t q — rt(latest_ support q )) = Cycle +rt(latest_ support q ) > to + Cycle— d. 
Thus, the earliest real-time a correct node will send "Propose-Pulse" message will be at to + Cycle — d. Until 
that time no correct node will send a "Propose-Pulse" or "Support-Pulse" message or invoke SS-Byz- Agree, 
proving (1) and (2). 

The bound on Cycle, implies that during the real-time interval [t + 6d, t + Cycle — A BYZ ] there is a 
window of at least lAd — 6d — d > 3d with no recording time that refers to it. Denote this interval by 
[t',t' + Bi], where Bi < to + Cycle — A BYZ < to + Cycle — d. The above argument implies that in the 
interval [t 1 + B/, to + Cycle — d] no correct node will update its latest _ support, and therefore the conditions 
of Lemma 13771 hold. 

Thus, the cycle _ countdown counters of all correct nodes expire within 5d past time to + Cycle — d. 
Looking back at the latest real-time, t q € [to — d, to + 6d], at which a correct node q updated the calculation 
of cycle_ countdown the node will send its next "Propose-Pulse" message at t q + cycle_ countdown = t q + 
Cycle — (t q — rt(latest_support q )) = Cycle + rt(latest_ support q ) <to + Cycle + 6d. Proving (3). □ 

Lemma l3~fil above implies that the nodes will not deadlock, despite the arbitrary initial states they could 
have recovered at. Moreover, by Lemma, l3~8l once a correct node succeeds in sending a "Support-Pulse" 
message, all correct nodes will converge. We are therefore left with the need to address the possibility that 
the faulty nodes will use the divergence of the initial values of correct nodes to prevent convergence by 
constantly causing them to decide and to update their cycle_ countdown counter without enabling a correct 
node to reach a point at which it sends a "Support-Pulse" message. 

By Lemma 13.61 within Cycle + Ad of the time the system becomes coherent all correct nodes execute 
Line El, thus within Cycle + Ad from the time the system became coherent. Let t\ be some real-time in 
that period by which all non-faulty nodes executed Line El. If any correct node sends a "Support-Pulse" 
message, then we are done. Assume otherwise. Since no correct node will invoke SS-Byz- Agree for any 
node more than once within a Cycle — lid, as we prove later, there will be at most / decisions between t\ 
and t\ + Cycle — lid. Since each decision returns recording times to nodes that range over at most a 5d 
real-time window, and since Cycle > (10/ + 16)d, there should be a real-time interval [£2,^2 + 5d], that no 
recording time refers to any real-time within it. This reasonings leads to the following lemma. 

Lemma 3.9 Assume that no correct node decision results in a recording time that refers to real-time 
rt(rP) in the real-time interval [t',t' + 5d}. Then by t' + Cycle + Ad all correct nodes decide, update their 
latest _ support and send "Reset", within 3d real-time units of each other. 

Proof: By the Timeliness- Agreement property (lb), any decision that will take place later than t' + A BYZ 
would result in latest _ support > t' . By Lemma f3. 61 by t' + Cycle + Ad all correct nodes' decisions lead to 
rt{latest_ support) > t', and by assumption to rt(latest_ support) > t' + 5d. Let q be the first correct node to 
decide and update its latest _ support to a value larger than t' + 5d on some (p, _,r|) for rt(r%) > t' + 5d, at 
some real-time t" > t 1 . By the Timeliness- Agreement property (Id), t" > rt(r^). Moreover, since the rt(r p ) 
are at most 3d apart, by t" + 3d all correct nodes will decide on some values and will update the latest _ support 
value. Therefore, in the interval [t",t" + 3d] all correct nodes should update their latest _ support, with 
rt(latest_ support) > V '. Thus, all correct nodes will execute Line E7 as a result of such decisions. Therefore, 
all correct nodes will send a "Reset" messages within 3d of each other. □ 

Let t2 be a real time at which the above lemma holds. Let £3 be the real-time past £2 by which all correct 
nodes send "Reset" as Lemma l3~9l claims. Thus, all correct nodes sent "Reset" in the real-time interval 
[£3 — 3d, £3] and by t^ + d no correct node will appear in the proposers of any other correct node. 

The final stage of the proof is implied from the following lemma. 

Lemma 3.10 // all correct nodes send a "Reset" in the period [to, to + 3d] then: 

1. No correct node will invoke ss-Byz- Agree during the period [to + 6d,to + Cycle — A BYZ ]; 

2. No correct node sends a "Support-Pulse" or "Propose-Pulse" during that period; 

3. The cycle_ countdown counters of all correct nodes expire within 5d of each other at some real-time in 
the interval [to + Cycle — A BYZ , to + Cycle + 6d]. 
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Proof: By real-time to + Ad all correct nodes will receive all the n — f "Reset" messages and will remove the 
correct nodes from proposers. Past that time and until some correct node will send a "Propose-Pulse" in the 
future, no correct node will send a "Support-Pulse" message. Similarly, past to + 6d and until some correct 
node will send a "Propose-Pulse" in the future no correct node will invoke a SS-Byz- Agree in Line D3. 

At that time the range of cycle _ countdown may be in [Cycle — A BYZ , Cycle], since, by the Timeliness- 
Agreement property (Id), faulty nodes may bring the correct nodes to decide on values that are at most 
A BYZ in the past. 

Until to + 60? + A BYZ , correct nodes may still decide on values from other agreements invoked by faulty 
nodes. By the Timeliness- Agreement property (lc), until some correct node will invoke a SS-Byz- Agree , 
no correct node will happen to decide on any message with rt(r') >to + 6d (latest possible recording time). 

Let t q be the latest real-time a correct node q updated the calculation of cycle_ countdown at some time 
during the interval [to , to + 6<i] . It will send its next "Propose-Pulse" message at t q + cycle_ countdown = 
t q + Cycle— (t q — rt(latest_support q )) = Cycle+rt(latest_support q ). By Timeliness- Agreement property (Id), 
and because the computation of latest _support q takes place in the interval [to, to + 6d] we conclude that 
interval rt(latest_support q ) > to + Cycle— A BYZ . Thus, t q + cycle_ countdown > to + Cycle— A BYZ . Thus, the 
earliest time a correct node will send a "Propose-Pulse" message will be at to + Cycle— A BYZ . Until that time 
no correct node will send a "Propose-Pulse" or "Support-Pulse" message or invoke SS-Byz- Agree, proving 
(1) and (2). 

The bound on Cycle, implies that during the real-time interval [to + 6d, t + Cycle — A BYZ ] there is a 
window of at least lid — 6d — d > 3d with no recording time that refers to it. Denote this interval by 
[t',t' + B'\, where Bi < t + Cycle — A BYZ < t + Cycle — d. The above argument implies that in the 
interval [t' + B',to + Cycle — d] no correct node will update its latest_ support, and therefore the conditions 
of Lemma 13 . 71 hold . 

Thus, the cycle_ countdown counters of all correct nodes expire within 5d past time to + Cycle— d. Looking 
back at the latest real-time, t q , at which a correct node q updated the calculation of cycle _ countdown, since 
it took place in the interval [to — d, to + 6d] and that rt{latest_support q ) cannot be larger than the time 
at which it is computed, the node will send its next "Propose-Pulse" message at t q + cycle_ countdown = 
tq + Cycle — (tq — rt(latest_ support q )) = Cycle + rt{latest_ support ) < to + Cycle + 6d. Proving (3). 

□ 

Corollary 3.11 In the conditions of Lem,m,a \8.1(A if no correct node invoked SS-Byz- Agree in the interval 
[to — A BYZ , to — B] then the bound of to + Cycle— A BYZ in Le.mma fS.lfA can be replaced by to + Cycle— B — 2d. 

Proof: By the Timeliness- Agreement property (lc) no decision can return a recording time that is 
earlier by more than 2d from an invocation of SS-Byz- Agree by a correct node. Therefore, in the proof of 
Lemma fo.lfll the minimal value for latest _support q for any correct node q can be to — B — 2d. Let t q be the 
latest real-time a correct node q updated the calculation of cycle _ countdown in the interval [to, to + 6d]. It 
will send its next "Propose-Pulse" message at t q + cycle_ countdown = t g + Cycle— [t q — rt(latest_ support q )) — 
Cycle + rt(latest_ support ) > Cycle— B — 2d. Thus, the earliest real-time a correct node will send "Propose- 
Pulse" message will be at to + Cycle — B — 2d. Until that time no correct node will send a "Propose-Pulse" 
or "Support-Pulse" message or invoke SS-Byz- Agree. Thus the bound of to + Cycle — A BYZ in Lemma 13. 101 
can be replaced by to + Cycle — B — 2d. □ 

We can now state the "fixed-point" lemma: 

Lemma 3.12 // the cycle_ countdown counters of all correct nodes expire in the period [to, to + 5d] and no 
correct node sent "Support- Pulse" in [to — (Cycle — 8d),to] and no correct node invoked SS-Byz- Agree in 
[t - (A BYZ + 6d),t ] then: 

1. All correct nodes invoke a pulse within 3d real-time units of each other before to + 9d; 

2. There exists a real-time to, to + Cycle — 2d < to < to + Cycle + 12d for which the conditions of 
Lem,m,a XS.12\ hold by replacing to with to- 
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Proof: Assume first that a correct node decided in [to, to + 6c?]. Let q be the first such correct node 
to do so, at some real-time t q . By the Timeliness- Agreement property (lc), and since no correct node has 
invoked SS-Byz- Agree in [to — (A BYZ + 6d),t ], the recording time needs to be in the interval [to — 2d,t q ]. 
By Timeliness- Agreement property (la), in the interval [t q ,t q + 3d] all correct nodes will decide, and the 
decision of all correct nodes will imply updating of latest_ support and the conditions for invoking a pulse 
hold. 

Moreover, in the interval [t q ,t q + 3d] the preconditions of Lemma [3,101 holds. Using Corollary 13. 1 II for 
B = 0we obtain the bounds of no "Support-Pulse" in [t q + 6c?, t q + Cycle — 2d] , an interval of Cycle — 8d, and 
all cycle _ countdown expire within 5c? in the interval [t q + Cycle — 2d, t q + Cycle + 6c?]. Since t q £ [to, to + 6d], 
we conclude that for to £ [t q + Cycle — 2d, t q + Cycle + 12d] the conditions of the lemma hold. 

Otherwise, no correct node decided in [to, to + 6d]. This implies that all correct nodes will end up sending 
their 'Propose-Pulse" by to + 5d and a correct node will send "Support-Pulse" by to + 6d. Lemma l3~8l completes 
the proof in a similar way. □ 

Observe that once Lemma l3.12l holds. it will hold as long as the system is coherent, since its preconditions 
continuously hold. So to complete the proof of the theorem we need to show that once the system becomes 
coherent, the preconditions of Lemma 13.121 will eventually hold. 

Denote by t the real-time at which the system became coherent. By Lemma [3 .61 hv t + Cycle + 4d all 
correct node executes Line El. Let t\ be some real-time in that period by which all correct nodes executed 
Line El. If any correct node sends a "Support-Pulse" message, then by Lemma 13.81 the precondition to 
Lemma EEl hold. 

Assume otherwise. By the Timeliness-separation property there are no concurrent agreements associated 
with the same sender of "Support-Pulse" message. Since the separation between decisions is at least 5c?, 
every correct node will be aware of a decision before invoking the next SS-Byz- Agree and therefore, the 
test in Line D2 will eliminate having more than a single decision per sending node within Cycle — 11c?. Since 
Cycle > (10/ + 16)c?, there will be at most / decisions between t\ and t\ + Cycle — lid. Since each decision 
returns recording times to nodes that range over at most 5c? real-time window, there should be a real-time 
interval ^,i,t% + 5c!], that no recording time of any correct node refers to any real-time within it. Note that 
t 2 < ti+ Cycle- lid- 5d < t+2-Cycle-12d. By Lemmal3~9l by t+2- Cycle- 12d+Cycle+ id < t+3-Cycle-M 
there exist a £3 such that all correct nodes sends "Reset" in the interval [£3 — 3c?, tg]. Thus, the preconditions 
to Lemma 13 . 1 01 hold . Thus, by t + 3 ■ Cycle — 8c? — 3c? + Cycle + 6c? = t + 4 • Cycle — 5c? the preconditions to 
Lemma T3 . 1 21 hold because either a correct node has sent "Support-Pulse" before that or from Lemma T3. 101 

Thus the system converges within less than 4 • Cycle from a coherent state. One can save one Cycle in the 
bound by overlapping the first one with the second one when the non-faulty nodes are not being considered 
correct. 

From that time on, all correct nodes will invoke pulses within 3c? of each other and their next pulse will be 
in the range stated by Lemma l3.12l The Lemma immediately implies that the bound on cycle max is Cycle+9d. 
Similarly, it claims that past to + 9c? no "Propose-Pulse" will be sent before to + Cycle — 2d, thus potentially 
the shortest time span between pulses at a node is Cycle — lid. This implies that cycle m i n = Cycle — lid. 
Moreover, the discussion also implies that: 

Lemma 3.13 Once the conditions of Le.mmti \3.1S\ hold, no correct node will invoke more than a single pulse 
in every cycle m ; n real-time interval. It will invoke at least one pulse in every cycle max real-time interval. 

This concludes the Convergence requirement with a = 3d, since the correct nodes will always invoke 
pulses within 3c? real-time units of each other. This completes the proof of Theorem □ 

Theorem 2 f Closure,) If the system is in a synchronized pulse _ state at time t s , then the system is in a 
synchronized pulse _ state at time t, t >t s . 

Proof: Let the system be in a synchronized pulse_state at the time immediately following the time the 
last correct node sent its "Propose-Pulse" message. Thus, all correct nodes have sent their "Propose-Pulse" 
messages. As a result, all will invoke their pulses within 3c? of each other, and will reset cycle_ countdown to 
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be at least Cycle — 2d. The faulty nodes may not influence the cycle length to be shorter than cycle m ; n or 
longer than cycle max . □ 

Thus we have proved the main theorem: 

Theorem 3 (Convergence and Closure) The Ab-Pulse-Synch algorithm solves the Self-stabilizing Pulse 
Synchronization Problem if the system remains coherent for at least 4 cycles. 

Proof: Convergence follows from Theorem The first Closure condition follows from Theorem The 
second Closure condition follows from Lemma f3, 131 □ 

Since we defined non-faulty to be considered correct within 2 cycles, we conclude: 

Corollary 3.14 From an arbitrary state, once the network become correct andn—f nodes are non-faulty, the 
Ab-Pulse-Synch algorithm solves the Self- stabilizing Pulse Synchronization Problem if the system remains 
so for at least 6 cycles. 

Lemma 3.15 ( Join of recovering nodes,) // the system is in synchronized state, a recovered node becomes 
synchronized with all correct nodes within A noc ; e time. 

Proof: The proof follows the arguments used in the proofs leading to Theorem 03 Within a cycle of 
non-faulty behavior of the recovering node it clears its variable and data structures of old values. Within 
cycle max it will synchronize with all other correct nodes, though it might not issue a pulse if it issued one 
in the first Cycle. But by the end of A no£ j e its cycle_ countdown will synchronize with all the correct nodes 
and will consequently produce the next pulse in synchrony with them. □ 
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